CS 598 : Theoretical Machine Learning
نویسنده
چکیده
We have already seen in previous lectures that a concept class C is PAC-learnable by the hypothesis class H if there exists an algorithm A such that for all c ∈ C, for all distributions D over X, for all > 0, for all δ > 0, A takes m = finite number of examples given in the form S = 〈 (x1, c(x1)), ..., (xm, c(xm)) 〉 , where each xi chosen from the space X at random according to the target distribution D, and produces a hypothesis h ∈ H such that Pr[errD(h) ≤ ] ≥ 1− δ After going through the definition, there are few questions which can be asked, such as what if we don’t have an algorithm that works for every and every δ. For instance, what if we have an algorithm that only works for δ = 1 3 . This algorithm guarantees a success probability of 23 . Technically this algorithm is not a PAC-learning algorithm because one cannot give any success probability as input and expect to achieve success with that probability. Can we use this algorithm which has a success probability of 23 to get arbitrary success probability?. One possible way is to run the algorithm many times, and we know from Chernoff bounds that the algorithm will succeed with probability ≥ 1−δ. In particular, if it is run for log(1δ times , it can be guaranteed that in one of the tries, it could output a good hypothesis. Lets look at another question, suppose there is an algorithm that doesn’t achieve any value of (error of the algorithm) i.e it cannot achieve any arbitrary error. For instance, say we have an algorithm that only gets = 0.49. In binary classification, if one were to just randomly guess then the error achieved would be half. Now, this algorithm performs slightly better than random guessing with error rate 0.49. Can we use this algorithm to achieve arbitrary accuracy?. This algorithm described above is known as Weak PAC-learning. For all distributions D ∼ x, for all δ > 0 the algorithm A uses finite data and produces a hypothesis h ∈ H such that Pr[errD(h) ≤ 12 − γ] ≥ 1 − δ, for some fixed γ > 0. This learning does better than random guessing and is a easy to design compared to some algorithm that does very well on entire instance space(X) which gets error as close to 0(harder problem to design). It is clear from previous argument, strong PAC learning implies weak PAC learning. But can we say weak learning implies strong learning?. In fact, we will see in the following section, it is true. We will show that by combining these weak learning in some specific way we can achieve a strong learner. Lets see an example,where the hypothesis class(H) is a class of boolean disjunctions.We have n boolean variables(x1, x2, ..., xn) and the target function is some disjunction f ∗ = (x1 ∨ x2 ∨ ... ∨ xk). And we are trying to learn this disjunction.
منابع مشابه
CS 598 : Theoretical Machine Learning
In the last lecture, we were looking at clustering sparse graphs in the stochastic block model with parameters p > q, where p = a n , and q = b n , we saw that we could not do perfect clustering as w.h.p, there will be a constant fraction of isolated vertices. However, we shall see that we can still do weak recovery, i.e., by allowing a small fraction of vertices to be misplaced, we will recove...
متن کاملCS 598 : Theoretical Machine Learning
If this is was what we had from the start then the task of clustering would be trivial. However, usually, the graphs that must be clustered are not this perfect and contain edges between S1 and S2. These edges can be considered noise in the representation of G. Therefore, a clustering algorithm would be attempting to cluster a noisy representation, G ′ , of the perfect graph G. More explicitly,...
متن کاملCS 598 : Theoretical Machine Learning
1 Online decision making Online decision making is a generalization of the setting we studied in the previous lecture. In particular, our goal is to design an algorithm that can pick an action or make a decision at every time step from set of actions/decisions so that the loss/regret of the algorithm is not much worse than that of the best action or decision on the sequence. Here are some scena...
متن کاملCS 598 : Theoretical Machine Learning
In the previous class, we had looked at the RMW algorithm. In this setting, the actions/decisions are referred to as experts. Assume that we have N experts. On any given day t, we choose an expert to use from among the N experts (like choosing a function h ∈ H). The adversary/world returns a loss vector lt = (lt 1, · · · lt N ). If the expert it is chosen on day t, then the performance after T ...
متن کاملEECS 598 - 005 : Theoretical Foundations of Machine Learning Fall 2015 Lecture 15 : Neural Networks Theory
متن کامل
Explain the theoretical and practical model of automatic facade design intelligence in the process of implementing the rules and regulations of facade design and drawing
Artificial intelligence has been trying for decades to create systems with human capabilities, including human-like learning; Therefore, the purpose of this study is to discover how to use this field in the process of learning facade design, specifically learning the rules and standards and national regulations related to the design of facades of residential buildings by machine with a machine ...
متن کامل